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ABSTRACT 

This paper describes a set of simulations which explore the way different features of lexical 
organisation affect the probability of finding a pair of associated words in a set of five 
randomly selected words. The simulation is equivalent to giving Ss a set of five words and 
asking if they can identify a pair of associated words among them. The paper speculates that it 
might be possible to extrapolate from a simple test of this sort and derive some interesting 
claims about the number of links connecting words in L2 speakers' lexicons. 
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I. INTRODUCTION 

This paper is the third of a series of studies in which we have used simulations of word 
association behaviour as a way of investigating how L2 mental lexicons are organised. In the 
first paper in this series (Wilks & Meara, 2002), we reported data from an experiment in 
which we tested the ability of LI-English speakers to recognise associated pairs in small sets 
of French words, (the five-word task). The material used consisted of a 40 item questionnaire. 
Each item in the questionnaire comprised a set of five words randomly chosen from the 
Franqais Fondamental list: approximately the first thousand most frequent words in French 
excluding grammatical items (Gougenheim et al., 1956). The participants were instructed to 
read each set of words and circle any two words in the set that they considered to be 
associated. A typical item might look like example one below: 



4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 


Figure 1: Random Links Fixed N 

In example 1, we would expect good speakers of French to circle cheminee (chimney) and 
feu (fire). If the participants saw more than one pair of associated words in the set, they were 
instructed to circle only the two words with the strongest link. If they found no links between 
any of the words they were instructed to write nothing, and continue to the next item. 

Alongside this group of LI-English speakers, we also ran a group of LI-French 
speakers, who carried out the same task. Our intention was to compare the data of the Ll- 
English speakers with the native speakers of French, and we expected, of course, to find that 
our LI-English speakers were less adept at identifying associated pairs than the LI-French 
speakers were. Not surprisingly, this turned out to be the case (t=6.47, p<.001). The data we 
reported are presented in table one below. 
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Nonnative Speakers 

Native Speakers 

Mean hits 

19.00 

30.90 

Standard Deviation 

7.65 

5.74 

number of items 

40 

40 

number of Ss 

30 

30 


Table 1: Mean hit rate per group. 


These data clearly confirm that there is a difference between the two subject groups, and in 
our original paper, we argued that the most obvious explanation of this difference is that the 
association network of the LI-group is "denser" than that of the L2-group. The idea here is 
that LI words have more associative connections than L2 words do, and that the number of 
connections directly affects the likelihood of Ss finding a pair of associated words in each 
stimulus set. We pointed out in Wilks and Meara (2002) that this density metaphor is one that 
frequently occurs in the literature on L2 word associations, but its implications are rarely 
developed. Aitchison (1987), for example, talks about the lexicon as “a gigantic 
multidimensional cobweb”, while for Bogaards (1994), le lexique evoque l'image des 
toiles d’araignee qui flottent au vent. Les materiaux lexicaux se presentent dans des structure 
ultra-legeres qui s’adaptent avec une souplesse et une flexibilite incroyables...” . Similar 
descriptions can easily be found in other widely-read authors, and most researchers in fact 
appear content to operate on this descriptive level. Wilks and Meara, however, attempted to 
show that it was possible to move beyond these imprecise metaphorical descriptions, and 
develop more specific quantitative models instead. We did this by comparing the 
experimental data with data generated by an association simulator. 

The simulator was a computer program that modelled a small, 1000-word lexicon in 
which each word was linked with a number of other words in the lexicon. The number of 
links between each word and the rest of lexicon - the NLinks parameter - could be varied, and 
Wilks and Meara showed that the probability of two associated words appearing in a small set 
of words varied with the value of this parameter. We then used these data to look again at the 
data generated by real subjects, and estimated what the real data implied about the density of 
interword connections in the mental lexicons of our test takers. Our initial guess had been that 
the LI-English speakers would have relatively few connections between words in their L2 
lexicons, perhaps as few as four or five. However, the results generated by the simulator 
forced us to revise that estimate. We concluded that the data implied a much denser set of 
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connections, even for L2 speakers, perhaps as many as 30 or 40 links for each word. Our 
2002 paper considered the implications of this for the way we normally interpret word 
association data generated by L2 speakers, and we concluded that the density of connections 
between words would have to be considerably higher than most researchers assumed it to be. 
This had significant implications for the way we thought about word association networks in 
an L2. 


II. EARLIER SIMULATIONS 

In our original paper, the simulator that we worked with consisted not of real words, but of a 
large array of numbers, which we considered to be the equivalent of "words" in a real lexicon. 
Our model lexicon consisted of 1000 "words": each word was linked randomly to a number of 
other words, which we consider to be associates of the original word. The overall structure of 
our model lexicon looks something like table 2. 


word 1 

123 

145 

160 

word 2 

99 

182 

279 

word 3 

129 

182 

761 

... 




word 999 

135 

856 

687 

word 1000 

72 

65 

321 


Table 2: Part of a simulated lexicon where each “word” is 
randomly associated with a number of other “words”. 


Here, each word is associated with three other words: word 1 is associated with word 123, 
word 145 and word 160; word 2 is associated with word 99, word 182, and word 279; and so 
on. In a simulation, it is a straightforward matter to vary the number of association links: the 
number of associations appears as a parameter in the model, and developing a model with 
four, five, six or more associates for each word is merely a matter of changing the value of 
this parameter, and setting up a new model with the relevant new parameter. 


© Servicio de Publicaciones. Universidad de Murcia. All rights reserved. 


IJES, vol. 7 (2), 2007, pp. 1-20 




Simulating Word Association in an L2 


5 


word 29 

15 

123 

135 

138 

742 

881 

word 367 

29 

421 

435 

567 

665 

678 

word 456 

71 

138 

156 

489 

543 

820 

word 552 

81 

140 

172 

495 

681 

729 

word 699 

10 

259 

273 

682 

695 

891 


Table 3: A simulated trial. 


In each trial of the simulator, the program mirrored our original study by randomly selecting a 
set of five stimulus words, and looking for an associational link between them. An example 
of a trial of this sort is shown in Table 3. The table contains a set of five stimulus words 
word29, word367, word456, word552 and word669 - each of which is associated with six 
other words. 

In our original paper, we programmed the simulator to register a hit if one of the five 
stimulus words also appeared in the association list of one of the other stimulus words. In 
Table 3, for example, word 29 occurs in the association list of word 367, and the program 
would therefore register a hit for this trial. By running lots of trials, typically a thousand, it is 
relatively straightforward to estimate the probability of at least one hit being registered for a 
random set of five target words. 

However, a number of critics argued that our method of determining a hit in these 
simulations was a very conservative one, and they made a very good case for adopting a 
different approach, arguing that alternative definitions of a hit were more plausible than the 
one we had adopted. For example, in Table 3, word 138 appears as an associate of both word 
29 and word 456, and we might want to argue that these two stimulus words are linked by this 
common associate. If word 29 were BIRD, word 456 were ROCKET, and word 138 were 
FLY, it would be plausible to argue that BIRD and ROCKET might be identified as 
associates, even though neither appears in associate list of the other. In the second paper in 
this series, Wilks, Meara, and Wolter (2005), therefore, we examined the extent to which the 
results of a simulation could be affected by different ways of identifying a "hit" in the five- 
word task. 

It is obvious with hindsight that, adopting a more lenient approach to identifying a hit 
in a set of stimulus words will have a dramatic impact on the likelihood of a hit being 
registered. Wilks, Meara, and Wolter (2005) examined four different ways of identifying a hit, 
and concluded that more lenient methods of identifying a hit had significant consequences for 
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identifying systematic differences between L2 speakers and native speakers. In these models, 
the probability of registering a hit for a set of five target words was surprisingly high, even 
when the number of associates for each word was fairly small. Wilks, Meara, and Walter 
(2005) concluded, somewhat pessimistically, that it might be extremely difficult to move from 
raw data like the data reported in Wilks and Meara's original experiments, to more general 
theoretical claims about the way L2 lexicons grow in complexity. 


III. MODELLING LEXICAL STRUCTURE 

In this paper, I will consider a second set of problems which arose in the discussion of our 
original paper. One of the main objections which appeared in these discussions concerned the 
way we had operationalised the structure of the lexicon itself. In the work described in Wilks 
and Meara (2002) and Wilks, Meara, and Wolter (2005), we had modelled our lexicons using 
random associations between words. Each word was randomly connected to N other words, 
selected by chance, but the number of associations was the same for each word. This 
introduced a level of uniformity into our models which is probably not characteristic of real 
lexicons. Real lexicons, it might be argued, are not likely to be ordered in this way. 
Specifically, we could argue that the number of associations linked to each word is not likely 
to be uniform, and probably varies quite a lot. Further, we could argue that the associations 
made between words are not likely to be random. At the very least, some words are more 
likely to be involved in an association link than others are, and we need to find a way of 
reflecting this in our simulations. Finally, a number of people suggested to us that we needed 
to look at small world lexicons (Ferrer i Cancho & Sole 2001; Watts 2003; Watts & Strogatz 
1998) in which a few densely structured associative clusters are connected by a small number 
of long-range associations between the clusters. 

These ideas are explored in the rest of this section. In the simulations reported in this 
paper, I have set aside the question of how we decide whether an association among the five 
stimulus words is identified. In order to simplify things, I have only used the second 
procedure developed in Wilks, Meara, and Wolter (2005). This is the model in which a set of 
stimulus words generates a hit whenever one of the stimulus words occurs as an associate of 
one of the other stimulus words, or any two of the stimulus words share a common 
association. This implementation is not the most generous of the models discussed in Wilks, 
Meara, and Walter (2005), but it is considerably less conservative than Wilks, Meara, and 
Walter's original model, and it is probably a good approximation of how people make 


© Servicio de Publicaciones. Universidad de Murcia. All rights reserved. 


IJES, vol. 7 (2), 2007, pp. 1-20 



Simulating Word Association in an L2 


7 


associations in real life. It is possible that the variations on lexical structure modelled in this 
paper may in fact interact in complex ways with the method we use to determine whether a 
stimulus set contains a hit or not. This problem will not be discussed here, however, as the 
arguments are sufficiently complex already. 

III.l. Variable random models 

Figure 1 and Figure 2 show the effect of allowing the number of associations linked with 
each word in the lexicon to vary. Figure One recaps the data presented in Wilks and Meara 
(2002). It shows the probability of a hit being returned for a set of five randomly selected 
target words, when all the words in the model have the same number of associations, and this 
figure is allowed to vary from 4 to 20. Figure 2 reports data from a set of simulations in 
which a slightly different approach is used. In these simulations, the total number of 
associational links in the model lexicon is determined, but these links are randomly 
distributed across the entire lexicon The number of associations any single word can have is 
not predetermined, and there is no limit on the number of associations any one word can 
have. In spite of this change of approach, the data reported in Figure 2 are for all intents and 
purposes identical to the more constrained data reported in Figure 1. This suggests that the 
over-riding factor that determines the probability of a hit being registered in a small stimulus 
set is the total number of associations in the lexicon, rather than the number of associations 
linked to any one word. As we shall see, this change of focus from the individual word to the 
total number of connections in the network turns out to be more interesting than it looks at 
first glimpse. 



4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 


Figure 1: Random Links Fixed N 



Figure 2: Random Links Variable N 
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This general conclusion that the main factor affecting the occurrence of a “hit” is the total 
number of associational connections in the model lexicon is also reinforced by data from two 
further models. Figure 3 shows data from a set of simulations in which the number of 
associations per word parameter is treated as a maximum, rather than a fixed value. This 
allows the number of associations that any one word has to vary between zero and the 
maximum value defined by the parameter. In practice, this means that the average number of 
associations is about half the maximum, with a relatively wide standard deviation. This 
arrangement is illustrated in Table 3. Here, the maximum number of associations is six. The 
individual words vary from zero to six associations, and the average number of associations 
for the five words shown is three. 


word 0001 

0194 

0456 

0341 

0222 



word 0002 

0033 

0006 

0519 

0343 

0931 

0945 

word 0003 














word 0999 

0438 

0456 





word 1000 

0229 

0179 

0202 





Table 3: A model lexicon where the number of associations is variable up to a maximum. 


At first sight, the data line in figure 3 looks rather different from the data reported in 
figure 1, but it is, in fact, just a stretched out version of the same basic pattern. When the 
maxiinks parameter is set at 20, the average number of links per word is about 10, and in a 
model containing 1000 words, this means that the total number of associational links is about 
10,000. The probability of registering a hit when the maxlinks parameter is set to 20 - .68 - is 
almost identical to the value of .69 that we found in the previous model when we had a total 
of 10,000 associational links in the model. Similarly, when the maxlinks parameter is set at 8, 
the average number of links per word is about 4, and the total number of links in the network 
is around 4000. The probability of registering a hit when the maxlinks parameter is set to 8 
should therefore be around 0.2 - the value returned when the number of fixed links equals 4 in 
figure 1. And indeed this turns out to be the case. 
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Figure 3: Max links=N 


A very similar data pattern also emerges if we allow the number of association links for each 
word to vary, but impose a relatively tight constraint on the amount of variation allowed.. 
The data in Figure 4 shows what happens when the number of associations linked to each 
word is allowed to vary by plus or minus three. Thus, when the average number of links is 
ten, some words may have as few as seven links, while others may have as many as thirteen. 
Over a large lexicon, the number of association links in the whole lexicon is approximately 
the same as the number we get with a fixed number of links, and once again, we find that the 
data in figure 4 is almost identical to the data reported in Figure 1 and Figure 2. 



4 6 8 10 12 14 16 18 20 

Figure 4: Mean links=N 
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Figure 5 illustrates a slightly more complicated way of constraining the associational 
links between words. In this set of simulations, we have imposed the constraint that 
associations take into account the order in which words are acquired. The constraint applied 
here is that association links are normally allowed only with a word that appears earlier in the 
dictionary. That is, word 300 can associate with any earlier word (wds 1-299), but its 
association list cannot contain words which occur latter in the dictionary (wds 301-1000). 
This constraint, has the effect of giving more weight to words which appear early in the 
dictionary list, so that word 20, for example, is more likely to appear as an associate than 
word 920. In this way, the model loosely reflects what might happen in lexicon where 
developmental processes are a dominant factor. 

Obviously, we cannot apply this constraint to all words: if we did, then word 0001 would 
have no words that it could associate to, word 0002 would only be able to associate to word 
0001, and so on. Therefore, in the simulations reported in Figure 5, the first 50 words are 
allowed to associate freely with each other. This gives us a small core of fifty words which 
are highly interconnected and a large number of other words which are loosely connected to 
this central core. The choice of 50 words for this central core is an arbitrary one, but as far as 
I can see, other values for the size of the core work in essentially the same way. 



Figure 5: A central core and ordered links 


Again, rather surprisingly perhaps, the results of these simulations look broadly similar to the 
random data reported in figure 1. The probability of a hit being registered in figure 5 is 
generally slightly higher than the probabilities reported in figure 1. This reflects the fact that 
in this model, the probability of a word appearing in a list of associates is not equal for all 
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words: words that appear early in the lexicon list are slightly more likely to be recorded as 
associates than words which appear later in the word list. In real life, this would be equivalent 
to RED (a high frequency word) being more likely to appear as an associate than PINK (a 
lower frequency word), and RED would thus be more likely to appear twice in a set of 
associations than PINK would. This seems like a plausible explanation for the increased hit 
rate in this model. 

III.2. Small world models 

So far, then, the data we have reported suggest that changing the way we model the 
underlying structure of our association network does not make a huge difference to the results 
returned by the simulator. The biggest difference occurs when we impose a developmental 
ordering constraint on the formation of associations, but even with this tight constraint, the 
data generated by the simulator does not change very much. The probability of a hit being 
registered increases slightly, but in other respects the simulations do not result in a radically 
different set of outcomes. Overall, the data reported in figures 1 to 5 are remarkably 
consistent. 

However, a number of writers have recently suggested that random structures are not a 
good model for lexical networks, and that human lexicons may exhibit the properties of a 
"small world" (Ferrer i Cancho & Sole, 2001; Watts & Strogatz, 1998,) . The main feature of 
small world networks is that most nodes in the network are connected to a small number of 
closely related nodes, and only a few connections go from one of these clusters to another 
cluster. An example of this type of structure is shown in Figure 6: 



Figure 6: A small world lexicon 


© Servicio de Publieaciones. Universidad de Murcia. All rights reserved. IJES, vol. 7 (2), 2007, pp. 1-20 



12 


Paul Meara 


In this illustration the words in the lexical network, represented by small squares, are 
organised into sixteen clusters, where each member of the cluster is linked immediately to 
several other members in the cluster. A small number of links join the clusters to each other, 
but these long-range links are few and appear to be less important than the links which 
operate within each cluster. 

What effects do a structure of this sort impose on our simulated data? The answer to 
this question is not straightforward, as it is not immediately obvious what characteristics of 
small world lexicons we need to program into our simulations. As a first stab, however, we 
devised a model in which all the words in our lexicon are grouped into 20 clusters, each 
consisting of 50 words. Within these clusters, associations are formed at random. In addition 
to these clusters, we also built in an additional fifty long-range links which went from one 
cluster to another. In this way of modelling a small world lexicon, our lexicon looks 
something like Table 4. 


word 0001 

0002 

0012 

0015 

0020 


word 0002 

0013 

0003 

0001 

0015 


word 0003 

0033 

0010 

0001 

0009 


word 0004 

0012 

0031 

0042 

0014 


word 0005 

0003 

0028 

0047 

0020 

0656 







word 0101 

0115 

0118 

0103 

0102 


word 0102 

1122 

0117 

0124 

0116 


word 0103 

0133 

0114 

0145 

0128 


word 0104 

0141 

0116 

0138 

0130 


word 0105 

0105 

0101 

0104 

0110 

0235 







word 0996 

0981 

0985 

0989 

0991 


word 0997 

0985 

0996 

0999 

0984 


word 0998 

0972 

0983 

0961 

0974 


word 0999 

0962 

0974 

0985 

0990 

0123 

word 1000 

0982 

0993 

0999 

0962 



Table 4: A fragment of a small world lexicon. 
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In this illustration, each word has four associates, with each of the four associates coming 
from a set of fifty words. All the links in the first set come from the first fifty words in the 
lexicon, while all the links in the second set are taken from the range 101 to 150, and all the 
links in the final set come from the range 951 to 1000. In addition, some words have an extra 
association, which links the cluster to another cluster. 

The data shown in Figure 7 comes from a set of simulations where the lexicon is 
structured into 20 clusters of 50 words, the number of long-range links is set at 50, and the 
number of links allowed to each word varies from four to 20. 



Figure 7: p(hit) in a small world network 
clusters=20, longlinks=50, shortlinks 4 to 20 

Surprisingly, this way of simulating a lexicon generates data which look very different from 
the data in figures 1 through 5. Although the probability of a hit rises slightly as the number 
of links per word grows, the rate of growth is painfully slow. It appears to reach an asymptote 
at around twenty links per word, when the probability of a hit is just over 40%. 

On reflection, it is not difficult to figure out why these figures look so very different 
from our earlier simulations. In the small world simulation, the critical factors must be the 
size of the clusters, and the probability of a stimulus set containing two words from the same 
cluster. If the clusters are small, then the probability of getting two stimulus words from the 
same cluster is also small; on the other hand, if the cluster size is large, then the probability of 
getting two words in a stimulus set from the same cluster increases. At the same time, if the 
clusters are small, then the chances of two words from the same cluster sharing a common 
associate will increase, while if the clusters are very large, then the chances of a common 
associate for two words from the same cluster will decrease. This suggests that there may be a 
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complex interplay between cluster size and number of associations per word in small world 
lexicons. The role of the long range associations is more difficult to predict, however and this 
suggests that it would be worthwhile to look in more detail at a range of different small world 
models, where cluster size and number of long range links are varied. A preliminary 
exploration of these issues is reported in Figures 8,9 and 10. 

Figure 8 shows the probability of a hit being registered in a set of 5 randomly selected 
stimulus words from a small world lexicon where cluster size varies from 10 to 50 words, and 
the number of links per word is 5 or 8. In this illustration, the number of long range links is 
held constant at 50. Figure 8 suggests that cluster size has some impact on the likelihood of a 
hit being registered, as long as the clusters are relatively small. When the clusters become 
larger, the number of associations per word appears to emerge as the more important factor. 
Thus, for small cluster sizes, there is very little difference between a lexicon where each word 
has five or eight links, but for larger clusters, there does appear to be a difference which can 
be ascribed to the number of associates each word is allocated. 

Figure 9 shows the effect of varying the number of long range links when cluster size 
and number of links per word are held constant. In this illustration, the number of links per 
word is held constant at eight, and data from two cluster sizes is reported, namely clusters of 
20 or 50 words. Surprisingly, varying the number of long range links from 0 to 50 seems to 
make very little difference to the outcome of these simulations. Increasing the number of long 
range links over this range increases the probability of a hit being registered by only a tiny 
amount. Cluster size is a much more important factor, with larger clusters returning a higher 
probability of registering a hit. 

Figure 10 shows a more detailed examination of the interaction between cluster size 
and the number of long range associations. In this figure, cluster size is allowed to vary from 
10 to 125 words, while the number of long links is allowed to vary from 10 to 1000. In all 
cases, the number of association links allowed to each word is held constant at 8. The data 
suggests very clearly that long range links have only a miniscule effect on the probability of a 
hit being registered as long as the number of long range links remains small relative to the 
overall size of the lexicon. 
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Figure 8: small world networks 
longlinks=50 shortlinks=5 or 8 
cluster size = 10 to 50 


Figure 9: small world networks 
shortlinks=8 longlinks=0 to 50 
cluster size =20 or 50 


Figure 10: Small world lexicons 
shortlinks=8 cluster size 10 to 125 
long links= 10 to 1,000 


However there is a tantalising hint in the data shown in figure 10 that the probability of a hit 
being registered might increase if the number of long-range links is allowed to increase until 
these links form a significant proportion of the total number of links in the model lexicon. 
Figure 11 examines this possibility. This illustration shows data from a small world model in 
which we have twenty clusters of fifty words. Within each cluster, each word has five links to 
other words in the cluster. On top of this basic structure, I have varied the number of long 
range links from zero to 15,000. What Figure 11 shows is that the number of long range links 
is indeed the critical factor in determining the probability of a hit being registered., and the 
overall shape of the curve in figure 11 is again very close to the data reported in our earlier 
figures. Bearing in mind that the within-cluster associations in this model add another 5000 
links to the total (each of the 1000 words has five within-cluster links), it probably makes 
sense to see the data in figure 11 as covering the range 5,000 to 20,000 links, and if we 
recalibrate the data in this way, we again have a data set which matches almost exactly the 
data reproduced in figure 2. 


IV. DISCUSSION 

With the data reported in Figure 11, we have, it seems, pretty much returned to our starting 
point. We have examined the behaviour of a number of differently structured model lexicons, 
and we have discovered that the local structure of these models has a negligible effect on the 
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probability of a pair of associated words being found in a random selection of five words. 
The only factor which emerges as important in these models is the overall number of 
associational links in the lexicon. This is a surprising finding, and it has a number of 
interesting implications. 

Wilks and Meara(2002) interpreted their original data as showing that words in their 
native speakers' lexicons had a greater number of connections than did words in their non¬ 
native speakers' lexicons. The differences were not great - native speakers were judged to 
have about eleven links per word, while non-native speakers were judged to have about seven. 
Meara (1996) and Meara and Wolter (2004) had suggested that a measure of this sort might 
have formed the basis of a measure of lexical organisation in an L2, and that this measure 
might be used to supplement the more commonly used measures of vocabulary depth. 
Unfortunately, the small size of this difference between LI speakers and L2 speakers in 
Wilks, Meara, and Wolter's (2005) study, and the difficulty they found in interpreting their 
data meaningfully left them very pessimistic about the possibility of developing a measure of 
this sort. However, if the local structure of their lexicons is not the critical factor which 
determines how speakers behave in our experimental task, then this pessimistic conclusion 
deserves to be re-visited. 



Figure 11: small world lexicons 
shortliiiks=5 cluster size=50 
longlinks=0 to 15,000 


Figure 12 shows the data first reported in Figure 2, with the addition of the real-life 
experimental data reported in Wilks, Meara, and Wolter (2005). In this figure, the upper 
horizontal line shows the probability of a hit in the native speaker data, while the lower 
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horizontal line indicates the probability of a hit in the learner data. Wilks, Meara, and Walter 
interpreted these data in terms of the number of links per word, but in the light of the 
discussion in the previous section, it now seems obvious that we should reinterpret the data in 
terms of the overall number of associational links in the Ss' lexicons. Using this approach, 
Wilks, Meara, and Walter's data suggests that LI speakers have about 11,500 links, while the 
non-native speakers have around 7,500 links giving us a difference of about 4,000 links. This 
figure is much easier to interpret than the mean links per word figure we used in our earlier 
paper, and it is easy to see what the data might mean in real life. More importantly, the total 
links figure is very much easier to incorporate into a model of lexical growth - the process of 
adding a new link is straightforward and transparent in a way that our original concept of 
mean links per word was not. 



Figure 12: Random Links Variable N and 
data from Wilks Meara and Wolter 

This conclusion breathes new life into Meara's suggestion that it might be possible to 
construct a measure of overall lexical organisation which could be used to study changes in 
the way learners' lexicons change as their L2 proficiency improves over time, and we hope to 
be able to report progress in this area in future studies. What seems to be needed is a 
standardised instrument such as the five-word task described in Wilks, Meara and Wolter, 
and an agreed way of interpreting the results this instrument generates in terms of the overall 
number of connections the target lexicon contains. Neither of these requirements looks 
impossibly difficult to achieve. 

However, the main point of this paper goes rather further than these practical 
suggestions. In a paper which was highly critical of some earlier simulation work that we had 
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carried out, Laufer (2005) dismissed simulations as a "convenient escape from the real 
world". Although Laufer's highly critical approach to our work is an extreme position on this 
issue, she voiced what seems to be a widely held view among SLA researchers that 
simulations are simply not an appropriate way of researching the processes of acquiring a 
second language. We have always argued that this view is short-sighted. We believe that 
simulations can throw valuable light on the way we interpret the data generated in 
experiments with real subjects, and that simulations can help us ask better research questions 
and help us design better research instruments to answer them. This paper has been a good 
example of this type of interaction between simulations and “real world” research. What we 
have done in these simulations is to take a commonly used metaphor about the way lexicons 
are structured, and explore how far we can go with it using a simple data collection 
instrument, the five-word task. It turns out that the metaphor does not work quite as we might 
expect. The metaphor leads us to expect that local organisation is the most important feature 
of a network, but working with the metaphor in detail has forced us to reach a different 
conclusion: overall structure seems to be more important than local structure - at least as far 
as the five-word task is concerned. Significantly, this overall structure can be conveniently 
summarised by a single parameter - the total number of links in the network - and for practical 
purposes, it seems as though we might be able to ignore other factors that looked as though 
they might be important, but turn out not to be. Put simply, the simulations suggest that our 
initial approach to the question of lexical organisation in L2 speakers may have been 
unnecessarily complex. 

Additionally, the simulations reported here provide us with some valuable feedback 
about the way our experimental task works. The simulations suggest that the five-word task 
should work well over a wide range of proficiency levels - only when the number of links 
reaches very high levels do simulations of the five-word task fail to show an increase in the 
number of hits registered. This level seems to be well above what we find even with native 
speaker subjects, so the lack of sensitivity at this level is not likely to be a serious problem. 
The simulations also seem to indicate that the five-word task might be capable of registering 
relatively small amounts of growth in lexical structure, particularly at low levels of L2 
proficiency. The data suggest that a 50 item test should be sensitive enough to register an 
increase of 500 associational links in a small lexicon, and a 100 item test should be 
considerably more sensitive to small changes in the number of links in the target vocabulary. 
This level of sensitivity is probably good enough to register changes in a lexicon over 
relatively short periods of time, such as the ones typically used in classroom research. This is 
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an important consideration, since some other widely used measures do not seem to be 
sensitive in this way (Meara, 2005). 

V. CONCLUSION 

To summarise, the data reported here is a good example of the way simulations interact both 
with theory and with practical data collection. Far from being "a convenient escape from the 
real world", simulations offer a way of thinking about the data collected in real experiments, 
and suggest ways of improving the way we collect this data in the first place. Work of this 
sort inevitably introduces some simplifications, but to be frank, most research does this too. 
The difference is that in simulation work the simplifications are explicit and overt rather than 
hidden and covert. In good simulation research, we can explore the implications of making 
these simplifications in a way which is just not possible for logistic reasons when we work 
with real subjects in experimental settings. 

I hope that readers of this paper will share my view that the approach I have used here 
is both illuminating and exciting, and that the ideas I have explored here will perhaps make 
some critical researchers think again about practical applications of simulations in SLA 
research. 
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